NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

CATO: End-to-End Optimization of ML-Based Traffic Analysis Pipelines

Wan, Gerry; Liu, Shinan; Bronzino, Francesco; Feamster, Nick; Durumeric, Zakir (April 2025, USENIX Symposium on Networked Systems Design and Implementation (NSDI))

Machine learning has shown tremendous potential for improving the capabilities of network traffic analysis applications, often outperforming simpler rule-based heuristics. However, ML-based solutions remain difficult to deploy in practice. Many existing approaches only optimize the predictive performance of their models, overlooking the practical challenges of running them against network traffic in real time. This is especially problematic in the domain of traffic analysis, where the efficiency of the serving pipeline is a critical factor in determining the usability of a model. In this work, we introduce CATO, a framework that addresses this problem by jointly optimizing the predictive performance and the associated systems costs of the serving pipeline. CATO leverages recent advances in multi-objective Bayesian optimization to efficiently identify Pareto-optimal configurations, and automatically compiles end-to-end optimized serving pipelines that can be deployed in real networks. Our evaluations show that compared to popular feature optimization techniques, CATO can provide up to 3600× lower inference latency and 3.7× higher zero-loss throughput while simultaneously achieving better model performance.
more » « less
Free, publicly-accessible full text available April 28, 2026
Beyond Data Points: Regionalizing Crowdsourced Latency Measurements

https://doi.org/10.1145/3700416

Sharma, Taveesh; Schmitt, Paul; Bronzino, Francesco; Feamster, Nick; Marwell, Nicole P (December 2024, Proceedings of the ACM on Measurement and Analysis of Computing Systems)

Despite significant investments in access network infrastructure, universal access to high-quality Internet connectivity remains a challenge. Policymakers often rely on large-scale, crowdsourced measurement datasets to assess the distribution of access network performance across geographic areas. These decisions typically rest on the assumption that Internet performance is uniformly distributed within predefined social boundaries, such as zip codes, census tracts, or neighborhood units. However, this assumption may not be valid for two reasons: (1) crowdsourced measurements often exhibit non-uniform sampling densities within geographic areas; and (2) predefined social boundaries may not align with the actual boundaries of Internet infrastructure. In this paper, we present a spatial analysis on crowdsourced datasets for constructing stable boundaries for sampling Internet performance. We hypothesize that greater stability in sampling boundaries will reflect the true nature of Internet performance disparities than misleading patterns observed as a result of data sampling variations. We apply and evaluate a series of statistical techniques to: (1) aggregate Internet performance over geographic regions; (2) overlay interpolated maps with various sampling unit choices; and (3) spatially cluster boundary units to identify contiguous areas with similar performance characteristics. We assess the effectiveness of the techniques we apply by comparing the similarity of the resulting boundaries for monthly samples drawn from the dataset. Our evaluation shows that the combination of techniques we apply achieves higher similarity compared to directly calculating central measures of network metrics over census tracts or neighborhood boundaries. These findings underscore the important role of spatial modeling in accurately assessing and optimizing the distribution of Internet performance, which can better inform policy, network operations, and long-term planning decisions.
more » « less
Full Text Available
The Hitchhiker's Guide to Analyzing the FCC Broadband Data Collection Datasets

Marques, Jonatas; Schrubbe, Alexis; Marwell, Nicole P; Feamster, Nick (August 2024, SSRN)

Full Text Available
Can Allowlists Capture the Variability of Home IoT Device Network Behavior?

https://doi.org/10.1109/EuroSP60621.2024.00015

He, Weijia; Bryson, Kevin; Calderon, Ricardo; Prakash, Vijay; Feamster, Nick; Huang, Danny Yuxing; Ur, Blase (July 2024, IEEE)

Full Text Available
Prediction Privacy in Distributed Multi-Exit Neural Networks: Vulnerabilities and Solutions

https://doi.org/10.1145/3576915.3623069

Kannan, Tejas; Feamster, Nick; Hoffmann, Henry (November 2023, ACM)

Full Text Available
Estimating WebRTC Video QoE Metrics Without Using Application Headers

https://doi.org/10.1145/3618257.3624828

Sharma, Taveesh; Mangla, Tarun; Gupta, Arpit; Jiang, Junchen; Feamster, Nick (October 2023, ACM)

Full Text Available
Generative, High-Fidelity Network Traces

https://doi.org/10.1145/3626111.3628196

Jiang, Xi; Liu, Shinan; Gember-Jacobson, Aaron; Schmitt, Paul; Bronzino, Francesco; Feamster, Nick (November 2023, ACM)

Recently, much attention has been devoted to the development of generative network traces and their potential use in supplementing real-world data for a variety of data-driven networking tasks. Yet, the utility of existing synthetic traffic approaches are limited by their low fidelity: low feature granularity, insufficient adherence to task constraints, and subpar class coverage. As effective network tasks are increasingly reliant on raw packet captures, we advocate for a paradigm shift from coarse-grained to fine-grained traffic generation compliant to constraints. We explore this path employing controllable diffusion-based methods. Our preliminary results suggest its effectiveness in generating realistic and fine-grained network traces that mirror the complexity and variety of real network traffic required for accurate service recognition. We further outline the challenges and opportunities of this approach, and discuss a research agenda towards text-to-traffic synthesis.
more » « less
Full Text Available
Discovery Testbed: An Observational Instrument for Broadband Research

https://doi.org/10.1109/E-SCIENCE58273.2023.10254876

Keahey, Kate; Feamster, Nick; Martins, Guilherme; Powers, Mark; Richardson, Marc; Schrubbe, Alexis; Sherman, Michael (October 2023, IEEE)

Full Text Available
LEAF: Navigating Concept Drift in Cellular Networks

https://doi.org/10.1145/3609422

Liu, Shinan; Bronzino, Francesco; Schmitt, Paul; Bhagoji, Arjun Nitin; Feamster, Nick; Crespo, Hector Garcia; Coyle, Timothy; Ward, Brian (September 2023, Proceedings of the ACM on Networking)

Operational networks commonly rely on machine learning models for many tasks, including detecting anomalies, inferring application performance, and forecasting demand. Yet, model accuracy can degrade due to concept drift, whereby the relationship between the features and the target to be predicted changes. Mitigating concept drift is an essential part of operationalizing machine learning models in general, but is of particular importance in networking's highly dynamic deployment environments. In this paper, we first characterize concept drift in a large cellular network for a major metropolitan area in the United States. We find that concept drift occurs across many important key performance indicators (KPIs), independently of the model, training set size, and time interval---thus necessitating practical approaches to detect, explain, and mitigate it. We then show that frequent model retraining with newly available data is not sufficient to mitigate concept drift, and can even degrade model accuracy further. Finally, we develop a new methodology for concept drift mitigation, Local Error Approximation of Features (LEAF). LEAF works by detecting drift; explaining the features and time intervals that contribute the most to drift; and mitigates it using forgetting and over-sampling. We evaluate LEAF against industry-standard mitigation approaches (notably, periodic retraining) with more than four years of cellular KPI data. Our initial tests with a major cellular provider in the US show that LEAF consistently outperforms periodic and triggered retraining on complex, real-world data while reducing costly retraining operations.
more » « less
Full Text Available
An Efficient One-Class SVM for Novelty Detection in IoT

Yang, Kun; Kpotufe, Samory; Feamster, Nick (November 2022, Transactions on machine learning research)

One-Class Support Vector Machines (OCSVMs) are a set of common approaches for novelty detection due to their flexibility in fitting complex nonlinear boundaries between normal and novel data. Novelty detection is important in the Internet of Things (“IoT”) due to the potential threats that IoT devices can present, and OCSVMs often perform well in these environments due to the variety of devices, traffic patterns, and anomalies that IoT devices present. Unfortunately, conventional OCSVMs can introduce prohibitive memory and computational overhead in detection. This work designs, implements, and evaluates an efficient OCSVM for such practical settings. We extend Nyström and (Gaussian) Sketching approaches to OCSVM, combining these methods with clustering and Gaussian mixture models to achieve 15-30x speedup in prediction time and 30-40x reduction in memory requirements without sacrificing detection accuracy. Here, the very nature of IoT devices is crucial: they tend to admit few modes of normal operation, allowing for efficient pattern compression.
more » « less
Full Text Available

« Prev Next »

Search for: All records